-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Add use_nullable_dtypes and nullable_backend global option to read_orc #49827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -33,7 +33,7 @@ sql-other, html, xml, plot, output_formatting, clipboard, compression, test]`` ( | |||
Configuration option, ``io.nullable_backend``, to return pyarrow-backed dtypes from IO functions | |||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |||
|
|||
A new global configuration, ``io.nullable_backend`` can now be used in conjunction with the parameter ``use_nullable_dtypes=True`` in :func:`read_parquet` and :func:`read_csv` (with ``engine="pyarrow"``) | |||
A new global configuration, ``io.nullable_backend`` can now be used in conjunction with the parameter ``use_nullable_dtypes=True`` in :func:`read_parquet`, :func:`read_orc` and :func:`read_csv` (with ``engine="pyarrow"``) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off-topic, but it seems read_excel supports use_nullable_dtypes
but not io.nullable_backend
. We should fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'll add this in a follow up PR.
|
||
.. note | ||
|
||
Currently only ``io.nullable_backend`` set to ``"pyarrow"`` is supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you intend to implement the flag for pandas as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would want to do this in a follow up PR (unless you're interested :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No is fine, just wanted to understand if this is intended at all.
i want to tackle json and sql next
"float": np.arange(4.0, 7.0, dtype="float64"), | ||
"float_with_nan": [2.0, np.nan, 3.0], | ||
"bool": [True, False, None], | ||
"datetime": pd.date_range("20130101", periods=3), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add bool without na?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, added.
pandas/tests/io/test_orc.py
Outdated
], | ||
} | ||
) | ||
bytes_data = df.to_orc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to avoid something subtle: can you do df.copy().to… since you are using df below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Added the copy
…ad_orc (pandas-dev#49827) * ENH: Add use_nullable_dtypes and nullable_backend to read_orc * Skip if not required pa version * Address review
io.nullable_type="pandas"|"pyarrow"
to control IO readeruse_nullable_dtype
#48957 (Replace xxxx with the GitHub issue number)doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Additionally